Discovering and Reconciling Value Conflicts for Data Integration
نویسندگان
چکیده
Short sections of text, not to exceed two paragraphs, may be quoted without explicit permission provided that full credit including © notice is given to the source." Abstract The integration of data from autonomous and heterogeneous sources calls for the prior identification and resolution of semantic conflicts that may be present. Unfortunately, this requires the system integrator to sift through the data from disparate systems in a painstaking manner. In this paper, we suggest that this process can be (at least) partially automated by presenting a methodology and techniques for the discovery of potential semantic conflicts as well as the underlying data transformation needed to resolve the conflicts. Our methodology begins by classifying data value conflicts into two categories: context independent and context dependent. While context independent conflicts are usually caused by unexpected errors, the context dependent conflicts are primarily a result of the heterogeneity of underlying data sources. To facilitate data integration, data value conversion rules are proposed to describe the quantitative relationships among data values involving context dependent conflicts. A general approach is proposed to discover data value conversion rules from the data. The approach consists of five major steps: relevant attribute analysis, candidate model selection, conversion function generation, conversion function selection and conversion rule formation. It is being implemented in a prototype system, DIRECT, for business data using statistics based techniques. Preliminary study indicated that the proposed approach is promising.
منابع مشابه
Conversion Rules from Disparate Data Sources
The successful integration of data from autonomous and heterogeneous systems calls for the resolution of semantic conflicts that may be present. Such conflicts are often reflected by discrepancies in attribute values of the same data object. In this paper, we describe a recently developed prototype system, DIRECT (DIscovering and REconciling ConflicTs). The system mines data value conversion ru...
متن کاملDiscovering and reconciling value conflicts for numerical data integration
The built-up in Information Technology capital fueled by the Internet and cost-effectiveness of new telecommunications technologies has led to a proliferation of information systems that are in dire need to exchange information but incapable of doing so due to the lack of semantic interoperability. It is now evident that physical connectivity (the ability to exchange bits and bytes) is no longe...
متن کاملDiscovering and Reconciling Semantic Conflicts: A Data Mining Perspective
Current approaches to semantic interoperability require human intervention in detecting potential conflicts and in defining how those conflicts may be resolved. This is a major impedance to achieving "logical connectivity", especially when the number of disparate sources is large. In this paper, we demonstrate that the detection and reconciliation of semantic conflicts can be automated using to...
متن کاملDIRECT: a system for mining data value conversion rules from disparate data sources
The successful integration of data from autonomous and heterogeneous systems calls for the resolution of semantic conflicts that may be present. Such conflicts are often reflected by discrepancies in attribute values of the same data object. In this paper, we describe a recently developed prototype system, DIRECT (DIscovering andREconciling ConflicT s). The system mines data value conversion ru...
متن کاملReconciling Continuous Attribute Values from Multiple Data Sources
Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. The challenges exist at three different levels: schema heterogeneity, entity heterogeneity, and data heterogeneity. The existing literature has largely focused on schema heterogeneity and entity heterogeneity; and the very limited wor...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1999